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♦ ABSTRACT 

Optimizing programs at run-time provides opportunities to apply aggressive c 
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programs based on information that was not available at compile time. At run 
be adapted to better exploit architectural features, optimize the use of dynamii 
simplify code based on run-time constants.Our profiling system provides a fh 
collecting information required for performing run-time optimization. We san 
hardware registers available on an Itanium processor, and select a set of code 
to important performance-events. We gather distribution information about th 
we wish to monitor, and test our traces by estimating the ability for dynamic \ 
to execute run-time generated traces.Our results show that we are able to capt 
time across various SPEC2000 integer benchmarks using our profile and pate 
relatively small number of frequently executed execution paths. Our profiling 
overhead increases execution time by only 2—4%. 
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* ABSTRACT 

Prefetching data ahead of use has the potential to tolerate the growing process 
performance gap by overlapping long latency memory accesses with useful c( 
sophisticated prefetching techniques have been automated for limited domain 
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codes that access dense arrays in loop nests, a similar level of success has elui 
programs, especially pointer-chasing codes written in languages such as C an< 
this problem by describing, implementing and evaluating a dynamic prefetchi 
technique runs on stock hardware, is completely automatic, and works for ger 
programs, including pointer-chasing codes written in weakly-typed languages 
It operates in three phases. First, the profiling phase gathers a temporal data r< 
a running program with low-overhead. Next, the profiling is turned off and a : 
algorithm extracts hot data streams, which are data reference sequences that fi 
the same order, from the temporal profile. Then, the system dynamically injec 
appropriate program points to detect and prefetch these hot data streams. Fina 
enters the hibernation phase where no profiling or analysis is performed, and 
continues to execute with the added prefetch instructions. At the end of the hi 
program is de-optimized to remove the inserted checks and prefetch instructic 
returns to the profiling phase. For long-running programs, this profile, analyz< 
hibernate, cycle will repeat multiple times. Our initial results from applying d 
are promising, indicating overall execution time improvements of 5.19% for s 
performance-limited SPECint2000 benchmarks running their largest (ref) inpi 
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